Phrase Pair Classification for Identifying Subtopics
نویسندگان
چکیده
Automatic identification of subtopics for a given topic is desirable because it eliminates the need for manual construction of domain-specific topic hierarchies. In this paper, we design features based on corpus statistics to design a classifier for identifying the (subtopic, topic) links between phrase pairs. We combine these features along with the commonly-used syntactic patterns to classify phrase pairs from datasets in Computer Science and WordNet. In addition, we show a novel application of our is-a-subtopic-of classifier for query expansion in Expert Search and compare it with pseudo-relevance feedback.
منابع مشابه
Investigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملA Bipartite Graph-Based Ranking Approach to Query Subtopics Diversification Focused on Word Embedding Features
Web search queries are usually vague, ambiguous, or tend to have multiple intents. Users have different search intents while issuing the same query. Understanding the intents through mining subtopics underlying a query has gained much interest in recent years. Query suggestions provided by search engines hold some intents of the original query, however, suggested queries are often noisy and con...
متن کاملWhy and How to Pay Different Attention to Phrase Alignments of Different Intensities
This work studies comparatively two typical sentence pair classification tasks: textual entailment (TE) and answer selection (AS), observing that phrase alignments of different intensities contribute differently in these tasks. We address the problems of identifying phrase alignments of flexible granularity and pooling alignments of different intensities for these tasks. Examples for flexible g...
متن کاملA Machine Learning Approach for QA and Novelty Tracks: NTT System Description
In one sense, the goals of QA and Novelty tasks are the same: extracting small document parts which are relevant to users’ queries. Additionally, the unit of extraction is almost always fixed in both tasks. For QA, an answer is a noun phrase in most cases, and for Novelty, a sentence is recognized as the basic information unit. This observation leads us to the following unified approach to both...
متن کاملQuery Subtopic Mining via Subtractive Initialization of Non-negative Sparse Latent Semantic Analysis
Ambiguous and multifaceted queries widely exist in academic and commercial search engines. Identifying the popular subtopics of queries is an important issue for search engines. In this paper, we propose a novel method to discover the popular subtopics for a given query. Our method first constructs a search behavior tripartite graph based on the search log data. Then, we utilize a subtractive i...
متن کامل